Specialized Metabolism of Gordonia Genus: An Integrated Survey on Chemodiversity Combined with a Comparative Genomics-Based Analysis

Members of the phylum Actinomycetota (formerly Actinobacteria) have historically been the most prolific providers of small bioactive molecules. Although the genus Streptomyces is the best-known member for this issue, other genera, such as Gordonia, have shown interesting potential in their specialized metabolism. Thus, we combined herein the result of a comprehensive literature survey on metabolites derived from Gordonia strains with a comparative genomic analysis to examine the potential of the specialized metabolism of the genus Gordonia. Thirty Gordonia-derived compounds of different classes were gathered (i.e., alkaloids, amides, phenylpropanoids, and terpenoids), exhibiting antimicrobial and cytotoxic activities, and several were also isolated from Streptomyces (e.g., actinomycin, nocardamin, diolmycin A1). With the genome data, we estimated an open pan-genome of 57,901 genes, most of them being part of the cloud genome. Regarding the BGCs content, 531 clusters were found, including Terpenes, RiPP-like, and NRPS clusters as the most frequent clusters. Our findings demonstrated that Gordonia is a poorly studied genus in terms of its specialized metabolism production and potential applications. Nevertheless, given their BGCs content, Gordonia spp. are a valuable biological resource that could expand the chemical spectrum of the phylum Actinomycetota, involving novel BGCs for inspiring innovative outlines for synthetic biology and further use in biotechnological initiatives. Therefore, further studies and more efforts should be made to explore different environments and evaluate other bioactivities.


Introduction
The phylum Actinomycetota is a priceless bioresource of active metabolites due to its diversified specialized metabolism [1]. Among over 250 genera described within the phylum, Streptomyces is the most prominent [1], especially for providing the most clinically used antibiotics [2]. Moreover, some studies have reported that species of this genus produce bioactivities other than antimicrobial, such as anticancer, immunomodulatory, and antioxidant [2,3]. Consequently, the research on streptomycetes' bioprospecting has continued to attract interest. Although non-streptomyces actinomycetes (also known as rare actinomycetes) have been relegated [4], evidence has recently been growing for their potential to provide novel bioactive compounds [5]. Some rare actinomycetes genera showing a high chemodiversity in their specialized metabolism are Corynebacterium, Nocardiopsis, Saccharomonospora, Pseudonocardia, and Gordonia [6].
Gordonia genus was established in 1988 [7] and is characterized by mycelial growth with fragmentation into rod-shaped elements or cocci, pigmented colonies (e.g., yellow, orange, red), and a G + C content between 63 and 69% [8]. Although some strains are opportunistic human pathogens (e.g., Gordonia sputi, Gordonia bronchialis, and Gordonia terrae [9]), several species have been reported to play crucial ecological niches, notably as symbionts [10]. This aspect is relevant in the natural products field, considering the importance of symbiotic bacteria in the chemical ecology of their hosts [11].
Regarding the bioprospecting potential of Gordonia species, a large amount of literature on bioremediation has involved Gordonia strains [10]. However, their specialized metabolites' isolation and bioactivity profiling remain poorly understood [12]. This fact constitutes a promising field for research, mainly due to the possibility of finding unique chemical structures or the opportunity to discover more sustainable sources of bioactive natural products (i.e., through microbial biotechnology) [13][14][15].
Natural products have been the primary drug source [16]. Recently, the role of microorganisms as producers of high-value-added products (e.g., bioactive compounds) has gained much attention [17]. Among the various reasons, it is worth emphasizing its contribution to the bioeconomy and its engagement with the Sustainable Development Goals (SDGs) [18,19]. Considering that pharmaceuticals are at the top of the bioeconomy value pyramid [20], microbial biotechnology is a crucial component in the development of a bio-based economy [21] and, therefore, to achieve sustainable economic development (i.e., SDG8) [19,22], while contributing to the good health and well-being (i.e., SDG3) [19,23]. In this context, exploring the available biological resources and their hidden potential as thoroughly as possible becomes a high-priority research topic.
As a result of new advances in genome mining tools, such as antiSMASH [24] and BiG-Scape [25], the massification of whole-genome sequencing projects and the resulting large volume of data has opened the possibility of innovative alternatives for natural product research [26]. The enzymes involved in specialized metabolite biosynthesis pathways are encoded in genes arranged in clusters, i.e., biosynthetic gene clusters (BGCs) [27]. The diversity and abundance of these BGCs (namely BGC space) are linked to a certain taxon-depending chemical space [1,28]. Therefore, mapping the arsenal of BGCs associated with an organism or group lets rational research prioritization toward the most promising bioresources of bioactive compounds. The information on Gordonia is very limited compared to Streptomyces, considering the ecological niches that have been described for them [10] and their average genome size (i.e., ca. 5.30 Mb [10], which is related to the number of BGCs [1]). However, it is captivating to comprehensively explore the diversity and potential of Gordonia's specialized metabolism and its bioprospecting profile.
To our knowledge, there are four published reviews related to the bioprospecting potential of Gordonia [8][9][10]29]. These articles comprise narrative reviews mainly focused on Gordonia ecological roles. Notably, they agreed on their intriguing biotechnological potential. However, several critical questions for bioprospecting initiatives are not addressed in these compilations (e.g., What is the dimension of the chemical space described so far for Gordonia? Are Gordonia-derived metabolites like those of other actinomycetes, such as Streptomyces? How diverse is the specialized metabolite biosynthetic machinery of Gordonia?). Therefore, envisioning that a comprehensive and rigorous synthesis of evidence is required to conceive new and successful studies, we conducted a literature survey to address these issues and collect methodologically Gordonia-derived metabolites, including an in silico determination of the promising targets using genome mining. The literature search was performed following the PRISMA-S guide (Preferred Reporting Items for Systematic reviews and Meta-Analyses literature search extension) [30]. The literature search was carried out in Scopus and Web of Science since these are two of the most extensive databases in terms of scientific literature [31]. The following search equation was developed to find the available literature on compounds isolated from Gordonia strains, regardless of the evaluated biological potential: gordonia AND (extract* OR compound OR metabolite OR structure). The first search was conducted on 15 October 2021. A new BioTech 2022, 11, 53 3 of 18 search was performed on 16 March 2022 to examine those articles published afterward. In each search, the results were limited to original articles using the setting options of each database. Then, the reviews that reached the screening stage were manually excluded. The title and abstract of each article were screened independently and double-blind by two authors using the Rayyan web-based tool [32]. Matches were immediately selected for the full-text reading stage. Discrepancies were agreed upon between the two authors, and a third author made the final decision whenever necessary. Inclusion criteria considered the following: (i) the study involves an actinomycete strain of the genus Gordonia, and (ii) the study reports the identification of a compound from a Gordonia strain. Studies evaluating the biotransformation capacity of Gordonia strains and co-cultures with non-actinomycetes were excluded. The PRISMA-S-based protocol of this systematic review can be consulted in Table S1.

Data Collection
An online form was developed to survey each paper that passed to the full-text reading stage for data collection. A preliminary version of the form was evaluated with seven randomly selected articles independently and double-blind by two authors to build the form. Once this phase was completed, the form was adjusted to incorporate as much data as possible for the aims of this study. The adjusted form was established as the final version to be applied. The articles were coded before being surveyed with the form. The structures of the chemical compounds reported in each paper were coded as follows: article code_n, where n corresponded to ascending Arabic numerals (i.e., 01, 02 . . . n) depending on the number of compounds reported in each paper. A list of the variables incorporated into the final version of the form can be reviewed in Table S2.

Chemoinformatics Analysis
The retrieved chemical structures were converted into SMILES (simplified molecularinput line-entry system) annotation [33] for the subsequent chemical space exploration using the Osiris DataWarrior v5.5.0 software [34] and SwissADME [35]. The building blocks of selected metabolites classified the compounds into alkaloids, amides, terpenoids, and phenylpropanoids. The compounds were filtered and grouped according to the FragFp fingerprint descriptor for chemical structure similarity analyses using DataWarrior. The molecular weight (MW), octanol/water partition coefficient (cLogP), number of donor and acceptor hydrogens (H-donors, H-acceptors), and the drug-likeness were calculated in DataWarrior. Pharmacokinetics (e.g., gastrointestinal absorption, blood-brain barrier permeability, potential inhibition of CYP enzymes, bioavailability) and medicinal chemistry friendliness parameters (e.g., PAINS alerts, Brenk alerts, leadlikeness violations, synthetic accessibility) were estimated through SwissADME.

Collection of Genome Sequences and Pan-Genome Estimation
All the Gordonia genomes used in this study were retrieved from the NCBI's Genome database. At the time of the search (17 December 2021), the database contained 254 genome assemblies, of which 39 were reference genomes. Detailed information about this dataset can be found in Table S3. The genomes were downloaded in FASTA format (.fna) and annotated through Prokka software [36]. The pan-genome was estimated using the Roary pipeline [37] from the annotated assemblies in GFF3 format. The parameters for the run were as follows: the percentage of isolates in which a gene must be present to be the nucleus was set at 99%, the minimum percentage identity for sequence comparisons performed by BlastP was set at 70% [38] and a maximum number of clusters of 60,000. The Prokka and Roary tools were run using the web-based platform Galaxy at https://usegalaxy.org (accessed on 20 December 2021) [39]. According to Heaps' law, the pan-genome was classified as open or closed [40,41]. For this, the outputs of (i) the total number of genes and (ii) the number of new genes were used. Roary shows the results of ten random iterations of the input files for each output. The values of κ and α of Heaps' law equation were calculated by power-law regression analyses [40]. Finally, gene annotation and classification by functional subsystems were performed according to the RAST toolkit using the PATRIC service center [42].

Phylogenomic Analysis
A maximum-likelihood-based tree was inferred from the multi-FASTA alignment of all core genes created in Roary. Gaps in aligned sequences were removed using Gap Strip/Squeeze v2.1.0 with 20% Gap tolerance [43]. The phylogenomic tree was generated in the platform Galaxy using FastTree v2.1.10 [44]. GTR + CAT was set as the nucleotide evolution model, and the other parameters were used as default. The analysis included 39 nucleotide sequences and 692,625 positions in the final data set. The resulting phylogenetic tree in Newick format was uploaded and edited in MEGA v11.0.10 [45]. Finally, from the phylogenetic tree and aligned multiple sequences, they were clustered according to bootstrap support (by default > 90%) and genetic distances (defined from an identity matrix of all core genome sequences computed in BioEdit v7.2.5) using ClusterPickerGUI_1.2.3 [46].

BGC Identification and Similarity Comparison
Gordonia genomes were analyzed using antiSMASH (antibiotics and Secondary Metabolite Analysis Shell) v6.0 to predict and annotate BGCs [24]. The resulting GenBank files (.gbk extension) were then used as input for the BiG-SCAPE v1.1.2 pipeline using the default parameters [25]. BiG-SCAPE runs a pairwise analysis of the identified BGCs and defines gene cluster families (GCFs) from a calculated similarity matrix. The resulting networks were imported into Cytoscape v3.9.1 for visualization and analysis.

Data Analysis
Data were entered into Microsoft Excel v2203 spreadsheets for pre-filtering. Descriptive statistical analysis of the data (i.e., calculation of central tendency measures (i.e., mean, median) and dispersion) and figure construction (e.g., histograms, pie charts, boxes, and whiskers) were performed in Graph Pad Prism v9.0.

General Findings
The initial literature search resulted in 493 documents after removing duplicates and other than original articles ( Figure 1). After screening titles and abstracts, 468 papers were discarded according to the inclusion/exclusion criteria. Of the remaining twenty-five articles, one could not be retrieved for full-text evaluation, eight did not identify any metabolites, five had a different scope than the one proposed (e.g., optimization of culture conditions), and three involved the evaluation of oligosaccharides (i.e., they were not small molecules). Finally, the review identified eight studies involving specialized metabolites obtained from species of the genus Gordonia ( Figure 1). Table 1 provides an overview of the Gordonia strains used in the included studies. In five out of eight studies, the strains were defined at the species level, while the others were at the genus level. Although each study used different strains, we observed two strains related to G. terrae (i.e., AIST-1 and WA 4-31) and another one (i.e., 647 W.R.1a.05) closely related to G. terrae according to its 16S ribosomal gene sequence. discarded according to the inclusion/exclusion criteria. Of the remaining twenty-five articles, one could not be retrieved for full-text evaluation, eight did not identify any metabolites, five had a different scope than the one proposed (e.g., optimization of culture conditions), and three involved the evaluation of oligosaccharides (i.e., they were not small molecules). Finally, the review identified eight studies involving specialized metabolites obtained from species of the genus Gordonia ( Figure 1).  [47]. The flow diagram was constructed using the PRISMA2020 online tool [48]. Detailed information on the literature search is presented in the PRISMA-S checklist in Table S1. Table 1 provides an overview of the Gordonia strains used in the included studies. In five out of eight studies, the strains were defined at the species level, while the others were at the genus level. Although each study used different strains, we observed two strains related to G. terrae (i.e., AIST-1 and WA 4-31) and another one (i.e., 647 W.R.1a.05) closely related to G. terrae according to its 16S ribosomal gene sequence. Most strains (62.5%) were isolated as free-living forms, while the rest were isolated from a host (i.e., Gordonia sp. 647 W.R.1a.05, Gordonia sp. UA19 and G. terrae WA 4-31, isolated from a cone snail, sponge, and cockroach, respectively). The strains were isolated from different environments, a characteristic usually attributed to Actinobacteria. Interestingly, given the growing research attention on marine bacteria, three strains were isolated  [47]. The flow diagram was constructed using the PRISMA2020 online tool [48]. Detailed information on the literature search is presented in the PRISMA-S checklist in Table S1. Most strains (62.5%) were isolated as free-living forms, while the rest were isolated from a host (i.e., Gordonia sp. 647 W.R.1a.05, Gordonia sp. UA19 and G. terrae WA 4-31, isolated from a cone snail, sponge, and cockroach, respectively). The strains were isolated from different environments, a characteristic usually attributed to Actinobacteria. Interestingly, given the growing research attention on marine bacteria, three strains were isolated from marine ecosystems (i.e., G. terrae AIST-1, Gordonia sp. 647 W.R.1a.05, and Gordonia sp. UA19).
Regarding the bioprospecting potential, in 5 studies (62.5%), the biological activity was explored, and the antimicrobial capacity was the most frequent (Table 1). In most of the included studies (62.5%), at least one compound was isolated. In fact, in the studies of Schneider et al. [51], Ma et al. [56], Takaichi et al. [53], and Lin et al. [54], 3, 4, 7, and 12 metabolites were reported, respectively. The characteristics of the metabolites retrieved from the studies included in this review are presented in the following section.
Schneider et al. [51], Ma et al. [56], Takaichi et al. [53], and Lin et al. [54], 3, 4, 7, and 12 metabolites were reported, respectively. The characteristics of the metabolites retrieved from the studies included in this review are presented in the following section.
The whole compound set (n = 30) was grouped by similarity relationships to explore the chemical diversity ( Figure 3). Eight compounds (i.e., 1, 2, 7, 8, 9, 12, 21, and 22) constitute unique fingerprints and, therefore, could not be clustered. The remaining compounds were distributed in five clusters of pairs, and four clusters comprised three compounds.
Since horizontal and vertical gene transfer are involved in the evolution of the biosynthetic pathways of specialized metabolites [57], we aimed to answer whether Gordoniaderived compounds are similar to those reported for Streptomyces. To this end, we conducted a similarity analysis within the StreptomeDB v3.0 database [58]. In total, 12 clusters involving 17 Gordonia-derived compounds and 111 Streptomyces-derived compounds were formed ( Figure 4). These clusters included different compound types (i.e., alkaloids, amides, phenylpropanoids, and terpenoids). In Figure S1, we highlight some examples of compounds derived from NRPS pathways and indole alkaloids. Within the Gordonia compounds, 1, 5, 6, 7, 10, 11, 15, and 20 have also been isolated from Streptomyces. Most of the Streptomyces-derived compounds are associated with various biological activities of high interest, such as antibacterial, antifungal, antioxidant, anti-inflammatory, antifouling, neuroprotective, cytotoxic, and enzyme inhibitors (Table S5). This fact indicates the broad untapped potential of Gordonia's specialized metabolism. The whole compound set (n = 30) was grouped by similarity relationships to explore the chemical diversity ( Figure 3). Eight compounds (i.e., 1, 2, 7, 8, 9, 12, 21, and 22) constitute unique fingerprints and, therefore, could not be clustered. The remaining compounds were distributed in five clusters of pairs, and four clusters comprised three compounds. Since horizontal and vertical gene transfer are involved in the evolution of the biosynthetic pathways of specialized metabolites [57], we aimed to answer whether Gordoniaderived compounds are similar to those reported for Streptomyces. To this end, we conducted a similarity analysis within the StreptomeDB v3.0 database [58]. In total, 12 clusters involving 17 Gordonia-derived compounds and 111 Streptomyces-derived compounds were formed ( Figure 4). These clusters included different compound types (i.e., alkaloids, amides, phenylpropanoids, and terpenoids). In Figure S1, we highlight some examples of compounds derived from NRPS pathways and indole alkaloids. Within the Gordonia compounds, 1, 5, 6, 7, 10, 11, 15, and 20 have also been isolated from Streptomyces. Most of the Streptomyces-derived compounds are associated with various biological activities of high interest, such as antibacterial, antifungal, antioxidant, anti-inflammatory, antifouling, neuroprotective, cytotoxic, and enzyme inhibitors (Table S5). This fact indicates the broad untapped potential of Gordonia's specialized metabolism.  Physicochemical parameters such as molecular mass (MW), octanol/water partition coefficient (clogP), and the number of hydrogen acceptor/donor groups (H-acceptors; Hdonors) play a determining role in drug development [59]. For the Gordonia-derived compounds, we found a wide distribution in those values of the respective physicochemical parameters (i.e., MW ranged from 198. 22   Physicochemical parameters such as molecular mass (MW), octanol/water partition coefficient (clogP), and the number of hydrogen acceptor/donor groups (H-acceptors; H-donors) play a determining role in drug development [59]. For the Gordonia-derived compounds, we found a wide distribution in those values of the respective physicochemical parameters (i.e., MW ranged from 198.22 to 1269.42, cLogP ranged from −3.17 to 14.32, Hacceptors ranged 1 to 29, H-donors ranged 0 to 13; Figure 5a-c). However, the drug-likeness situated the druggable potential since most of those compounds gathered from Gordonia fell into positive values (Figure 5d), indicating its relationship with trade drugs. We also evaluated the pharmacokinetics and medicinal chemistry friendliness by predictive models (Figure 5e-i). Compounds 10 and 11 were excluded, as they exceeded the size limitations of SwissADME. Notably, most of the compounds (98.86%) showed a high probability of oral bioavailability (Figure 5e), which correlates with their degree of compliance with Lipinski's rules (Table S6). Regarding medicinal chemistry friendliness, the compounds showed encouraging values in parameters such as PAINS, Brenk alerts, and lead-likeness (Figure 5f-h, respectively). However, from the point of view of their synthetic accessibility, no compound was below 2 (very easy), and several compounds (28.57%) scored above 5 ( Figure 5i). This aspect is one of the most critical challenges in natural product research. One possible approach to address this issue is using microorganisms as small factories (microbial cell factories) for these high-value-added compounds [60]. Then, the comprehensive understanding of the latent diversity of a microorganism's specialized metabolism becomes a key factor in advancing along this road. To provide evidence for the usefulness of Gordonia as a promising source of bioactive compounds, we further analyze its genomic diversity, emphasizing its specialized metabolism.

Gordonia Pan-Genome
The number of available genomes of Gordonia spp. (n = 39) made it possible to explore its gene repertoire. The pan-genome size was estimated at 57,901 genes, and since it is shown to follow Heaps' law (i.e., α < 1, 0 < γ < 1, Figure 6) [40], it was classified as open. A total of 693 (1.20%) genes comprised the core genome, 170 (0.31%) the softcore, 4822 (8.33%) the shell, and 52,209 (90.17%) the cloud genome (Figure 7a). In the latter, remarkably, 38,209 genes (i.e., 73.18% of cloud genome) were unique, found only once in one of the analyzed genomes. The complete matrix of genes per genome (i.e., per species) can be found in Table S7.

Gordonia Pan-Genome
The number of available genomes of Gordonia spp. (n = 39) made it possible to explore its gene repertoire. The pan-genome size was estimated at 57,901 genes, and since it is shown to follow Heaps' law (i.e., α < 1, 0 < γ < 1, Figure 6) [40], it was classified as open. A total of 693 (1.20%) genes comprised the core genome, 170 (0.31%) the softcore, 4822 (8.33%) the shell, and 52,209 (90.17%) the cloud genome (Figure 7a). In the latter, remarkably, 38,209 genes (i.e., 73.18% of cloud genome) were unique, found only once in one of the analyzed genomes. The complete matrix of genes per genome (i.e., per species) can be found in Table S7. Regarding the functional classification of the predicted genes, the outstanding roles in the core genome were those related to metabolism and protein processing (Figure 7b, 16% each). In the softcore, the distribution was slightly more homogeneous except for genes involved in metabolism. Concerning the shell and cloud genome, the genes assigned to metabolism were clearly preponderant. As expected, most of the genes related to specialized metabolism (i.e., genes that constitute the BGCs) were found in the cloud genome ( Figure 7c). To comprehensively analyze the content of BGCs, we mined the Gordonia genomes with the specialized metabolism-dedicated antiSMASH v6.0 tool. The results are shown in the section below. Regarding the functional classification of the predicted genes, the outstanding roles in the core genome were those related to metabolism and protein processing (Figure 7b, 16% each). In the softcore, the distribution was slightly more homogeneous except for genes involved in metabolism. Concerning the shell and cloud genome, the genes assigned to metabolism were clearly preponderant. As expected, most of the genes related to specialized metabolism (i.e., genes that constitute the BGCs) were found in the cloud genome (Figure 7c). To comprehensively analyze the content of BGCs, we mined the Gordonia genomes with the specialized metabolism-dedicated antiSMASH v6.0 tool. The results are shown in the section below. Figure 7. Overview of the assignments to functional subsystems of the Gordonia pan-genome. (a) Distribution of genes in core/softcore, shell, and cloud genomes. (b) Doughnut plot of the functional assignment of genes in the core, softcore, shell, and cloud genome (from the inner to the outer ring, respectively). (c) Pie chart of the distribution of genes assigned to specialized metabolism in core and shell genomes (no genes were found within core and softcore genomes).

Figure 7.
Overview of the assignments to functional subsystems of the Gordonia pan-genome. (a) Distribution of genes in core/softcore, shell, and cloud genomes. (b) Doughnut plot of the functional assignment of genes in the core, softcore, shell, and cloud genome (from the inner to the outer ring, respectively). (c) Pie chart of the distribution of genes assigned to specialized metabolism in core and shell genomes (no genes were found within core and softcore genomes).
Regarding the functional classification of the predicted genes, the outstanding roles in the core genome were those related to metabolism and protein processing (Figure 7b, 16% each). In the softcore, the distribution was slightly more homogeneous except for genes involved in metabolism. Concerning the shell and cloud genome, the genes assigned to metabolism were clearly preponderant. As expected, most of the genes related to specialized metabolism (i.e., genes that constitute the BGCs) were found in the cloud genome (Figure 7c). To comprehensively analyze the content of BGCs, we mined the Gordonia genomes with the specialized metabolism-dedicated antiSMASH v6.0 tool. The results are shown in the section below.
A phylogenetic tree was inferred by the maximum-likelihood method to explore the evolutionary relationship based on the core genome of the 39 species of Gordonia. According to the bootstrap values, the constructed tree showed strong support for every branch (Figure 8). Considering the similarity between the core genome sequences (Table S8), taxa with a support threshold > 90% and genetic distance < 20.7% were clustered. Seven groups were established as follows: cluster I with nine species, clusters II-V with two species each, cluster VI with seven species, and cluster VII with fourteen species. Interestingly, the species G. jinhuaensis did not group into any clusters, with G. paraffinivorans and G. desulfuricans being the species with the highest level of core genome similarity (i.e., 77.9% and 77.8%, respectively; Table S8). A phylogenetic tree was inferred by the maximum-likelihood method to explore the evolutionary relationship based on the core genome of the 39 species of Gordonia. According to the bootstrap values, the constructed tree showed strong support for every branch (Figure 8). Considering the similarity between the core genome sequences (Table S8), taxa with a support threshold >90% and genetic distance <20.7% were clustered. Seven groups were established as follows: cluster I with nine species, clusters II-V with two species each, cluster VI with seven species, and cluster VII with fourteen species. Interestingly, the species G. jinhuaensis did not group into any clusters, with G. paraffinivorans and G. desulfuricans being the species with the highest level of core genome similarity (i.e., 77.9% and 77.8%, respectively; Table S8).

Gordonia BGC Diversity
Gordonia genome mining identified 531 clusters classified into 39 different BGC types (Table S9). Figure 9a,b show the 23 most frequent BGCs and the number/type of BGCs detected for each Gordonia species, respectively. Regarding the BGC richness, Gordonia shandongensis showed the lowest number (i.e., 8), contrasting with Gordonia soli, with the highest content, exhibiting finally 23 BGCs (Figure 9b). Among some noteworthy data, the median and mode were 13 BGCs, RiPP-like and Terpenes BGCs were found in all genomes surveyed, and NRPS was the BGCs with the highest prevalence (i.e., 22.6%). Moreover, comparing the size of the RiPP-like and Terpenes clusters (the clusters identified in all the genomes analyzed), the Terpenes clusters varied largely in contrast to RiPP-like (Figure 9c).

Gordonia BGC Diversity
Gordonia genome mining identified 531 clusters classified into 39 different BGC (Table S9). Figure 9a,b show the 23 most frequent BGCs and the number/type of detected for each Gordonia species, respectively. Regarding the BGC richness, Go shandongensis showed the lowest number (i.e., 8), contrasting with Gordonia soli, wi highest content, exhibiting finally 23 BGCs (Figure 9b). Among some noteworthy da median and mode were 13 BGCs, RiPP-like and Terpenes BGCs were found in all gen surveyed, and NRPS was the BGCs with the highest prevalence (i.e., 22.6%). Moreover paring the size of the RiPP-like and Terpenes clusters (the clusters identified in all t nomes analyzed), the Terpenes clusters varied largely in contrast to RiPP-like (Figu  Figure 9a; for a detailed list see Table S9), (c) size in kb of the BGCs occur all the genomes studied (i.e., RiPP-like and Terpene).
Since BGC content varied considerably (i.e., 15 clusters range) among Gordoni cies, we interrogated different strains of the same species to see whether BGC conte mained relatively stable at the intraspecies level. We selected the species Gordonia p prenivorans (seven strains), G. rubripertincta (six strains), and G. terrae (eight strain they were among those with the highest number of genomes reported from dif strains. Besides the fact that strains vary drastically less in terms of their content of  Figure 9a; for a detailed list see Table S9), (c) size in kb of the BGCs occurring in all the genomes studied (i.e., RiPP-like and Terpene).
Since BGC content varied considerably (i.e., 15 clusters range) among Gordonia species, we interrogated different strains of the same species to see whether BGC content remained relatively stable at the intraspecies level. We selected the species Gordonia polyisoprenivorans (seven strains), G. rubripertincta (six strains), and G. terrae (eight strains), as they were among those with the highest number of genomes reported from different strains. Besides the fact that strains vary drastically less in terms of their content of BGCs (i.e., ranges of four, seven, and four, for G. polyisoprenivorans, G. rubripertincta, and G. terrae, respectively), the types of BGCs are more homogeneous within each species (Figure 10). However, some BGCs are found only in some strains but not in all strains within each species. For instance, G. rubripertincta strains NBRC 101908 and SD5 contain unique NRPS hybrid BGCs (Table S10). Notably, NRPS was the cluster with the most copies in all strains ( Figure 10). (i.e., ranges of four, seven, and four, for G. polyisoprenivorans, G. rubripertincta, and G. terrae, respectively), the types of BGCs are more homogeneous within each species ( Figure  10). However, some BGCs are found only in some strains but not in all strains within each species. For instance, G. rubripertincta strains NBRC 101908 and SD5 contain unique NRPS hybrid BGCs (Table S10). Notably, NRPS was the cluster with the most copies in all strains ( Figure 10).  Figure 9a; for a detailed list, see Table S10.

Discussion
Natural resource has historically been the primary source of bioactive compounds for developing products with industrial applications, mainly in drug discovery [16]. The research on marine organisms (e.g., sponges, corals, algae) has recently shown exciting potential for new compounds. However, it has been reported that the discovery of novel compounds is more associated with new source organisms [12], among which microorganisms play a preponderant role [61]. Herein we have rigorously scrutinized the bioprospecting potential of actinomycetes of the genus Gordonia based on a systematic review of the literature and comparative genomic analysis of 39 species. The review showed that  Figure 9a; for a detailed list, see Table S10.

Discussion
Natural resource has historically been the primary source of bioactive compounds for developing products with industrial applications, mainly in drug discovery [16]. The research on marine organisms (e.g., sponges, corals, algae) has recently shown exciting potential for new compounds. However, it has been reported that the discovery of novel compounds is more associated with new source organisms [12], among which microorganisms play a preponderant role [61]. Herein we have rigorously scrutinized the bioprospecting potential of actinomycetes of the genus Gordonia based on a systematic review of the literature and comparative genomic analysis of 39 species. The review showed that Gordonia species are poorly explored bioresources in terms of their specialized metabolism, contrasting with the potential shown in the analysis of their biosynthetic machinery (i.e., BGC content diversity).
The compounds recovered from the literature included structurally diverse metabolites from the main biosynthetic pathways (i.e., nitrogen-containing, phenylpropanoids, and terpenoids), despite coming from only eight strains (Table 1). Interestingly, the strains came from different environments (e.g., marine, terrestrial), including free-living and hostassociated forms, which is an indicator of adaptive success and, therefore, the plasticity of their genotype. This diversity of ecological niches of Gordonia has already been recognized [10] and, as in other microorganisms, critically relies on developing a sophisticated specialized metabolism [62].
Regarding the drug-like physicochemical characteristics of the Gordonia-derived compounds, encouraging features and values were noticed ( Figure 5). This is consistent with the fact that several actinomycete-derived compounds have led to approved drugs (e.g., tigecycline, everolimus, telithromycin, miglustat, daptomycin, amrubicin, biapenem, ertapenem, pimecrolimus, and gemtuzumab ozogamicin [67]). Although most of them have been isolated from Streptomyces, other genera can also produce highly interesting bioactive compounds. For instance, telithromycin has been isolated from Saccharopolyspora erythraea and calicheamicin γ1 (the cytotoxic agent in gemtuzumab ozogamicin) from Micromonospora echinospora [61]. Other rare actinomycetes from which bioactive compounds have been isolated are the genera Actinoallomurus, Allostreptomyces, Streptosporangium, Polymorphospora, Lechevalieria, Mumia, Actinomadura, and Amycolatopsis [68]. This condition makes the rare actinomycetes Gordonia spp. another valuable source for discovering small bioactive molecules with these characteristics suitable for drug development. However, as with most natural products, their synthesis is challenging (inferred from the SwissAMDE analysis; Figure 5i). In this regard, the identification/availability of potential microbial factories becomes a critical issue in overcoming this experimental barrier. Therefore, defining the true potential of Gordonia to find bioactive specialized metabolites could contribute significantly to the discovery of renewable bioresources of compounds with pharmaceutical interest.
Concerning the pan-genome that we estimated for the genus Gordonia, the alpha (α = 0.421 ± 0.031) and gamma (γ = 0.631 ± 0.006) values are like those reported for Streptomyces in the work of Caicedo-Montoya et al. (i.e., α = 0.45 ± 0.009 and γ = 0.60 ± 0.002) [38]. However, it should be clarified that our work was not intended to establish the Gordonia pan-genome accurately but to assess the diversity of its genotypic repertoire, especially sizing their cloud genome, where specialized metabolism is understood to be contained. Indeed, while no genes involved in specialized metabolism were found in the core or softcore genome, most were found in the cloud genome and some in the shell genome. Thus, given the open pan-genome of Gordonia, it was intriguing to explore the richness of the specialized metabolites that might be potentially linked to this genus.
Tools such as antiSMASH and BiG-Scape have allowed genome mining to become a promising strategy for exploring the latent natural product diversity in each bioresource [25]. Our results revealed that Gordonia harbors a high diversity of BGCs (Figure 9). Even though many are closely related (Figure 11), suggesting predominant chemotypes such as terpenoids and RiPP-like, species with unique BGCs (i.e., singletons) were also found. The high prevalence of terpenoid-type BGCs in Gordonia is not surprising, given its characteristic carotenoid production [10]. As for RiPP-type BGCs, their abundance in actinomycetes, including Gordonia, has recently been reported [69]. Most species we analyzed had a single cluster except for G. desulfuricans, G. jacobaea, G. otitidis, G. polyisoprenivorans, G. rhizosphera, and G. sputi, which involved two detected clusters. Notably, they were all grouped into cluster I ( Figure 8) and were located in closely related clades (e.g., G. jacobaea and G. sputi are on the same branch). Since RiPPs represent a diverse family of compounds with high pharmaceutical interest, they constitute another important metabolite group that could be obtained from Gordonia strains. high prevalence of terpenoid-type BGCs in Gordonia is not surprising, given its characteristic carotenoid production [10]. As for RiPP-type BGCs, their abundance in actinomycetes, including Gordonia, has recently been reported [69]. Most species we analyzed had a single cluster except for G. desulfuricans, G. jacobaea, G. otitidis, G. polyisoprenivorans, G. rhizosphera, and G. sputi, which involved two detected clusters. Notably, they were all grouped into cluster I ( Figure 8) and were located in closely related clades (e.g., G. jacobaea and G. sputi are on the same branch). Since RiPPs represent a diverse family of compounds with high pharmaceutical interest, they constitute another important metabolite group that could be obtained from Gordonia strains. Figure 11. Molecular networks of predicted BGCs in genomes of the genus Gordonia. Each node represents a BGC and colors to BGC class. The similarity matrix among BGCs was calculated using BiG-Scape, and the final networks were edited in Cytoscape v3.9.1. A complete and detailed list is available in Table S12.
Regarding unique BGCs, genome mining in Gordonia found promising results. When classifying the 531 detected BGCs into families, 23.93% were classified as Other (i.e., different from the well-known NRPS, Terpenes, and PKS, and most of these were singletons). For instance, the 11 BGCs found in G. jinhuaensis were singletons. Interestingly, G. jinhuaensis was the most distant species among the group analyzed ( Figure 8). This outcome supports the fact that, in addition to horizontal transfer, other evolutionary forces, such as functional divergence and de novo assembly, play an essential role in the diversification of BGCs and their consequent chemodiversity [70]. This fact also supports the idea that focusing on more distant (or rare) clades could be a valid strategy for finding novel compounds [12]. Additionally, 227 BGCs did not establish networks, indicating their uniqueness. This event is associated with the similarity of the retrieved compounds since several (26.67%) comprise unique fingerprints. Moreover, when different strains of the Figure 11. Molecular networks of predicted BGCs in genomes of the genus Gordonia. Each node represents a BGC and colors to BGC class. The similarity matrix among BGCs was calculated using BiG-Scape, and the final networks were edited in Cytoscape v3.9.1. A complete and detailed list is available in Table S12.
Regarding unique BGCs, genome mining in Gordonia found promising results. When classifying the 531 detected BGCs into families, 23.93% were classified as Other (i.e., different from the well-known NRPS, Terpenes, and PKS, and most of these were singletons). For instance, the 11 BGCs found in G. jinhuaensis were singletons. Interestingly, G. jinhuaensis was the most distant species among the group analyzed ( Figure 8). This outcome supports the fact that, in addition to horizontal transfer, other evolutionary forces, such as functional divergence and de novo assembly, play an essential role in the diversification of BGCs and their consequent chemodiversity [70]. This fact also supports the idea that focusing on more distant (or rare) clades could be a valid strategy for finding novel compounds [12]. Additionally, 227 BGCs did not establish networks, indicating their uniqueness. This event is associated with the similarity of the retrieved compounds since several (26.67%) comprise unique fingerprints. Moreover, when different strains of the same species were analyzed, strains with unique BGCs were found, and the diversity could even be sustained at the intraspecies level. This fact has been reported in other genera, including Streptomyces [1].

Conclusions
Although there is a lack of bioprospecting research on the genus Gordonia, it represents a promising bioresource for discovering high-value-added microbial natural products. Several Gordonia-derived compounds had also been obtained from Streptomyces and showed diverse bioactivity potential. In addition to demonstrating applications that remain to be evaluated, Gordonia may be a source for challenging Streptomyces-derived compounds. Genome mining findings showed that species of the genus Gordonia harbor diverse types of BGCs, which in addition to Terpenes, RiPP, NRPS, and PKS, included novel motifs which could be associated with innovative compounds or scaffolds. Gordonia is a rare actinomycete of high value for bioprospecting type studies. It is important to note that Gordonia is a relatively new genus, and most species have been described since 2000 (74.47%), so as new species are reported, the chemical space associated with Gordonia could be substantially expanded in further studies.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/biotech11040053/s1, Figure S1: Gordonia-derived compounds similar to Streptomyces-derived compounds; Table S1: PRISMA-S Checklist; Table S2: List of variables used in the form to survey each article that passed the screening phase; Table S3: Database of Gordonia genomes used in this study; Table S4: Data on Gordonia-derived compounds retrieved from the articles included in the systematic review; Table S5: Clustering of Gordonia-derived compounds by similarity to Streptomyces-derived compounds; Table S6: Datawarrior and SwissADME physicochemical parameters; Table S7: Roary presence/absence table; Table S8: Similarity matrix between the core genome sequences; Table S9: Results of mining Gordonia reference genomes by the antiSMASH tool; Table S10: Results of mining G. polyisoprenivorans, G. rubripertincta, and G. terrae genomes by the antiSMASH tool; Table S11: Clustering by family of BGCs detected in Gordonia genomes by the BiG-Scape tool; Table S12: Network data by similarity of BGCs detected in Gordonia genomes built by the BiG-Scape tool.

Data Availability Statement:
The data used to support the findings of this study are provided within this article. However, any required further information can be provided by the corresponding author upon request.